Test-driven Assessment of [R2]RML Mappings to Improve Dataset Quality

نویسندگان

  • Anastasia Dimou
  • Dimitris Kontokostas
  • Markus Freudenberg
  • Ruben Verborgh
  • Jens Lehmann
  • Erik Mannens
  • Sebastian Hellmann
  • Rik Van de Walle
چکیده

rdf dataset quality assessment is currently performed primarily after data is published. Incorporating its results, by applying corresponding adjustments to the dataset, happens manually and occurs rarely. In the case of (semi-)structured data (e.g., csv, xml), the root of the violations often derives from the mappings that specify how the rdf dataset will be generated. Thus, we suggest shifting the quality assessment from the rdf dataset to the mapping definitions that generate it. The proposed test-driven approach for assessing mappings relies on rdfunit test cases applied over mappings specified with rml. Our evaluation is applied to different cases, e.g., dbpedia, and indicates that the overall quality of an rdf dataset is quickly and significantly improved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing and Refining Mappings to RDF to Improve Dataset Quality

rdf dataset quality assessment is currently performed primarily after data is published. However, there is neither a systematic way to incorporate its results into the dataset nor the assessment into the publishing workflow. Adjustments are manually –but rarely– applied. Nevertheless, the root of the violations which often derive from the mappings that specify how the rdf dataset will be genera...

متن کامل

DBpedia Mappings Quality Assessment

The root of schema violations for rdf data generated from (semi-)structured data, often derives from mappings, which are repeatedly applied and specify how an rdf dataset is generated. The dbpedia dataset, which derives from Wikipedia infoboxes, is no exception. To mitigate the violations, we proposed in previous work to validate the mappings which generate the data, instead of validating the g...

متن کامل

A PCA/ICA based Fetal ECG Extraction from Mother Abdominal Recordings by Means of a Novel Data-driven Approach to Fetal ECG Quality Assessment

Background: Fetal electrocardiography is a developing field that provides valuable information on the fetal health during pregnancy. By early diagnosis and treatment of fetal heart problems, more survival chance is given to the infant.Objective: Here, we extract fetal ECG from maternal abdominal recordings and detect R-peaks in order to recognize fetal heart rate. On the next step, we find a be...

متن کامل

GENETIC PROGRAMMING AND MULTIVARIATE ADAPTIVE REGRESION SPLINES FOR PRIDICTION OF BRIDGE RISKS AND COMPARISION OF PERFORMANCES

In this paper, two different data driven models, genetic programming (GP) and multivariate adoptive regression splines (MARS), have been adopted to create the models for prediction of bridge risk score. Input parameters of bridge risks consists of safe risk rating (SRR), functional risk rating (FRR), sustainability risk rating (SUR), environmental risk rating (ERR) and target output. The total ...

متن کامل

Real-time quality monitoring in debutanizer column with regression tree and ANFIS

A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the outp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015